from:
http://www.rebeccabarter.com/blog/2019-08-19_purrr/
https://adv-r.hadley.nz/functionals.html#purrr-style
# to download the data directly:
gapminder_orig <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv")
# define a copy of the original dataset that we will clean and play with
gapminder <- gapminder_orig[1] "country" "year" "pop" "continent" "lifeExp" "gdpPercap"
[1] 1704 6
[1] "character"
country year pop continent lifeExp gdpPercap
"character" "integer" "numeric" "character" "numeric" "numeric"
modify() returns in the same output format as input, so it is not a suitable choice in this case
country year pop continent lifeExp gdpPercap
142 12 1704 5 1626 1704
make sure to pass .x otherwise it will not perform action on columns
Note : we have missed the column names above
Adding column names
defining .x
continents <- continent_year %>%
pull(continent) %>%
as.character
years <- continent_year %>%
pull(year).x <- continents[1]
.y <- years[1]
gapminder %>%
filter(continent == .x,
year == .y) %>%
ggplot() +
geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
ggtitle(paste(.x, .y))Applying above test code for generic usage object
plot_list <- map2(.x = continents, .y = years,
.f = ~{gapminder %>%
filter(continent == .x,
year == .y) %>%
ggplot() +
geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
ggtitle(paste(.x, .y))})[[1]]
Below I nest the gapminder data by continent.
[[1]]
NA
To pull or extract from it by index
[1] "Asia" "Europe" "Africa" "Americas" "Oceania"
To pull or extract data from it by index
since map returns a lits itself, so we will need to pull from the list
tibble(list_col = list(c(1, 5, 7),
5,
c(10, 10, 11))) %>%
mutate(list_sum = map(.x = list_col, .f = sum)) %>%
pull(list_sum)[[1]]
[1] 13
[[2]]
[1] 5
[[3]]
[1] 31
it could be better to result out a ve tor instead of list
tibble(list_col = list(c(1, 5, 7),
5,
c(10, 10, 11))) %>%
mutate(list_sum = map_dbl(.x = list_col, .f = sum))How to get mean from column listed tibble data
[1] 60.0649
Now applying mean function on all column listed tible data
gapminder_nested <- gapminder_nested %>%
mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + gdpPercap + year, data = .x)))
gapminder_nestedgapminder_nested <- gapminder_nested %>%
mutate(pred = map2(.x = lm_obj, .y = data, function(.x,.y) predict(.x, .y)))
gapminder_nestedcan also be written as
gapminder %>%
group_by(continent) %>%
nest %>%
mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + year + gdpPercap, data = .))) %>%
mutate(lm_tidy = map(lm_obj, broom::tidy))gapminder %>%
group_by(continent) %>%
nest %>%
mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + year + gdpPercap, data = .))) %>%
mutate(lm_tidy = map(lm_obj, broom::tidy)) %>%
ungroup() %>%
transmute(continent, lm_tidy) %>%
unnest(cols = c(lm_tidy))this will split the data frame on basisi of factors provided from the variable
$Africa
$Americas
$Asia
$Europe
$Oceania
NA
set.seed(23489)
gapminder_list <- gapminder %>%
split(gapminder$continent) %>%
map(~sample_n(., 5))
gapminder_list$Africa
$Americas
$Asia
$Europe
$Oceania
NA
function to limit/ filter data frame with conditions
discar() is opposite of keep
$Americas
$Europe
$Oceania
NA
reduce() is designed to combine (reduces) all of the elements of a list into a single object by iteratively applying a binary function (a function that takes two inputs).
[1] 6
also returns the intermediate values.
[1] 1 3 6
Reduce can be useful in combining columns by using left_join etc. or to do repeated rbind()
every(), some()
For instance to ask whether every continent has average life expectancy greater than 70, you can use every()
[1] FALSE
[1] TRUE